Assignment 4: 3D Gaussian Splatting¶

Author: Mukai (Tom Notch) Yu

Email: mukaiy@andrew.cmu.edu

I'm training everything with A100-80G

1. 3D Gaussian Splatting¶

1.1.5 Perform Splatting¶

q1.1.5

1.2.2 Perform Forward Pass and Compute Loss¶

progress

"wriggling gaussians"

final

Evaluation --- Mean PSNR: 28.489

Evaluation --- Mean SSIM: 0.930

parameters = [
    {"params": [gaussians.pre_act_opacities], "lr": 0.05, "name": "opacities"},
    {"params": [gaussians.pre_act_scales], "lr": 0.01, "name": "scales"},
    {"params": [gaussians.colors], "lr": 0.05, "name": "colors"},
    {"params": [gaussians.means], "lr": 0.001, "name": "means"},
]

1k iterations, took 19min 29s final loss = 0.008, reached after 100 iterations so shouldn't take that many iterations

1.3.1 Rendering Using Spherical Harmonics¶

No view-dependent         View-dependent

Not view-dependent                                       View-dependent

No view-dependent         View-dependent

Not view-dependent                                       View-dependent

I think it's quite evident, the velvet on the chair has complicated BRDF, resulting in pseudo-shadow

Besides, the golden imprints appears more specular

No view-dependent         View-dependent

Not view-dependent                                       View-dependent

Also pseudo-shadow at the edge of the chair

2. Diffusion-guided Optimization¶

2.1 SDS Loss + Image Optimization¶

Without guidance         With guidance

Without guidance (2000 iterations) a hamburger                                       With guidance (2000 iterations) a hamburger

Without guidance         With guidance

Without guidance (2000 iterations) a standing corgi dog                                       With guidance (2000 iterations) a standing corgi dog

Without guidance         With guidance

Without guidance (2000 iterations) diffusion                                       With guidance (2000 iterations) diffusion

Without guidance         With guidance

Without guidance (2000 iterations)                                       With guidance (2000 iterations) (this is bull shit)

a snake poking its head out of a water bottle like aquarius, while its body swirls at the bottom of the bottle

2.2 Texture Map Optimization for Mesh¶

hamburger

A hamburger

shire

A village like Shire in the lord of the rings

I think random viewpoint and light source would worsen result, because we cannot render or "imprint" the image onto the mesh from a particular viewpoint, but rather from an "averaged" viewpoint. Eventually, we are only "imprinting" the dominant color,

The geometry is fixed, and we do not encode viewpoint information in the text prompt when we generate latents,

2.3 NeRF Optimization¶

$\lambda_{entropy} = 10^{-2}$, $\lambda_{orient} = 10^{-3}$, latent iteration: $20\%$

corgi

A standing corgi dog

Final loss after 100 epochs, 2291.58 seconds: 0.445

hamburger

A hamburger

Final loss after 100 epochs, 2438.83 seconds: 0.315

snake

A Slytherin snake

Final loss after 100 epochs, 2288.24 seconds: 0.648

(pretty sure this is not a snake)

2.4.1 View-dependent text embedding¶

corgi

A standing corgi dog

Final loss after 100 epochs, 2293.36 seconds: 0.563

hamburger

A hamburger

Final loss after 100 epochs, 2300.36 seconds: 0.769

snake

A Slytherin snake

Final loss after 100 epochs, 2300.79 seconds: 0.439

The view-dependent text encoding fixed the no-head problem with snake, this is because having multiple views can reveal the obstructed geometry

However, I think this is still a hacky solution, because simply appending the view text does not guarantee that the diffusion model understands or focuses on different views in either CV or NLP sense. This is evident in the corgi dog experiments, we had better results even without view-dependent text embedding.